class: center, middle, inverse, title-slide .title[ # STA 235H - Review Session Midterm ] .subtitle[ ## Fall 2023 ] .author[ ### McCombs School of Business, UT Austin ] --- <!-- <script type="text/javascript"> --> <!-- MathJax.Hub.Config({ --> <!-- "HTML-CSS": { --> <!-- preferredFont: null, --> <!-- webFont: "Neo-Euler" --> <!-- } --> <!-- }); --> <!-- </script> --> <style type="text/css"> .small .remark-code { /*Change made here*/ font-size: 80% !important; } .tiny .remark-code { /*Change made here*/ font-size: 90% !important; } </style> # Structure - We will review **.darkorange[causal inference]** - For questions about regression interpretation, check out **.darkorange[slides uploaded to course website]** -- - We will talk about RCTs, selection on observables, Diff-in-Diff, and RD (if there's time). -- - You will **.darkorange[discuss with a group]** and come up with answers. We will discuss them together. --- <br> <br> .box-2Trans[Participate!] .box-2trans[Even if you make a mistake, everyone can learn from that] -- .box-4Trans[Ask questions!] .box-4trans[You are here on a Friday... take advantage of it :)] --- # Exercise: Rain Barrels **.darkorange[Context]** - The metropolitan Austin area is interested in helping residents become more environmentally conscious, reduce their water consumption, and save money on their monthly water bills. - To do this, Hays, Comal, Guadalupe, and Travis counties have jointly initiated a new program that provides free rain barrels to families who request them. These barrels collect rain water, and the reclaimed water can be used for non-potable purposes (like watering lawns and gardens). Officials hope that families that use the barrels will rely more on rain water and will subsequently use fewer county water resources, thus saving both the families and the counties money. - Being evaluation-minded, the counties hired an consultant (you!) before rolling out their program, and you convinced them to fund and run a randomized controlled trial (RCT) during 2021 using a random sample of families within these counties. --- # Exercise: Rain Barrels Your RCT dataset contains the following variables: .small[ - `id`: A unique ID number for each household - `water_bill`: The family’s average monthly water bill, in dollars - `barrel`: A factor variable showing if the family participated in the program - `yard_size`: The size of the family’s yard, in square feet - `home_garden`: An character variable showing if the family has a home garden - `attitude_env`: The family’s self-reported attitude toward the environment, on a scale of 1-10 (10 meaning highest regard for the environment) - `temperature`: The average outside temperature] 1) What is your treatment group and your control group? What is your outcome? -- 2) How should your balance table look like? -- 3) How would you estimate the treatment effect in this case? --- # Exercise: Rain Barrels These are your results from the RCT: .small[ ``` ## ## Call: ## lm(formula = water_bill ~ barrel, data = barrels_rct) ## ## Residuals: ## Min 1Q Median 3Q Max ## -88.239 -21.062 -1.299 20.558 79.191 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 228.442 2.038 112.07 <2e-16 *** ## barrel -40.573 2.744 -14.78 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 30.3 on 491 degrees of freedom ## Multiple R-squared: 0.308, Adjusted R-squared: 0.3066 ## F-statistic: 218.6 on 1 and 491 DF, p-value: < 2.2e-16 ``` ] - Interpret the relevant coefficient. --- # Exercise: Rain Barrels **.darkorange[Context]** - Imagine you ran your RCT and found that it had a positive effect, so the counties decide to roll it out and offer it to everyone. <u>Note that not everyone had to take it</u>, but they could opt into it if they wanted. -- 1) Do you think that the effect that we find in the RCT should be the same as the effect for the entire population? (*Hint: Think about generalizability*) -- 2) Who do you think is more likely to opt into this program? -- 2) Can we compare people that opt in vs those that don't opt in to get a causal effect? Why or why not? --- # Exercise: Rain Barrels **.darkorange[Context]** - You decide to do matching between people that opt-in (treatment group) and those that do not (control group), and match on environmental attitude, temperature, yard size, and home garden. -- 1) Can you estimate a causal effect using your matched sample? Why or why not? -- 2) What else could you be missing? --- # Exercise: Rain Barrels These are your results from matching: .small[ ``` ## ## Call: ## lm(formula = water_bill ~ barrel, data = barrels_matched) ## ## Residuals: ## Min 1Q Median 3Q Max ## -89.16 -17.90 -1.01 20.95 79.76 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 229.165 1.299 176.37 <2e-16 *** ## barrel -34.225 1.838 -18.62 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 29.2 on 1008 degrees of freedom ## Multiple R-squared: 0.256, Adjusted R-squared: 0.2553 ## F-statistic: 346.9 on 1 and 1008 DF, p-value: < 2.2e-16 ``` ] - Interpret the relevant coefficient. --- # Exercise: Rain Barrels **.darkorange[Context]** You now realize that there was another neighboring county, Bexar, that implemented this program in 2019, and you have data for all these counties for the period 2018-2019. You think this would be a great setup for a Diff-in-Diff analysis! -- 1) What would your treatment group be? And your control group? -- 2) What two variables would you need to create and how would you do it? --- # Exercise: Rain Barrels These are your results from diff-in-diff: .small[ ``` ## ## Call: ## lm(formula = water_bill ~ treat * post, data = barrels_dd) ## ## Residuals: ## Min 1Q Median 3Q Max ## -93.762 -20.699 -0.708 21.922 79.809 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 224.800 1.080 208.18 <2e-16 *** ## treat 22.155 1.527 14.51 <2e-16 *** ## post 15.012 1.527 9.83 <2e-16 *** ## treat:post -44.807 2.160 -20.75 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 29.3 on 2940 degrees of freedom ## Multiple R-squared: 0.1397, Adjusted R-squared: 0.1388 ## F-statistic: 159.1 on 3 and 2940 DF, p-value: < 2.2e-16 ``` ] - Interpret the relevant coefficient. --- # Exercise: Rain Barrels **.darkorange[Context]** - Imagine now Austin decides to give a rain barrel to every household with an annual income lower than $50,000. You think this would be a good setup for a regression discontinuity design -- 1) For whom are we estimating an effect in this case? -- 2) What are some checks that we should do *before* the RDD analysis? --- # Exercise: Rain Barrels .small[ ``` ## ## Call: ## lm(formula = water_bill ~ treat * dist, data = barrels_rd) ## ## Residuals: ## Min 1Q Median 3Q Max ## -70.246 -16.468 0.527 15.502 66.702 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.030e+02 2.253e+00 90.122 < 2e-16 *** ## treat -2.113e+01 3.386e+00 -6.241 9.45e-10 *** ## dist -2.954e-03 2.266e-04 -13.035 < 2e-16 *** ## treat:dist 1.189e-04 3.466e-04 0.343 0.732 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 22.42 on 489 degrees of freedom ## Multiple R-squared: 0.7347, Adjusted R-squared: 0.7331 ## F-statistic: 451.4 on 3 and 489 DF, p-value: < 2.2e-16 ``` ] - Interpret the relevant coefficient.